{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5-7 优化器optimizers\n",
    "\n",
    "机器学习界有一群炼丹师，他们每天的日常是：\n",
    "\n",
    "拿来药材（数据），架起八卦炉（模型），点着六味真火（优化算法），就摇着蒲扇等着丹药出炉了。\n",
    "\n",
    "不过，当过厨子的都知道，同样的食材，同样的菜谱，但火候不一样了，这出来的口味可是千差万别。火小了夹生，火大了易糊，火不匀则半生半糊。\n",
    "\n",
    "机器学习也是一样，模型优化算法的选择直接关系到最终模型的性能。有时候效果不好，未必是特征的问题或者模型设计的问题，很可能就是优化算法的问题。\n",
    "\n",
    "深度学习优化算法大概经历了 SGD -> SGDM -> NAG ->Adagrad -> Adadelta(RMSprop) -> Adam -> Nadam 这样的发展历程。\n",
    "\n",
    "详见《一个框架看懂优化算法之异同 SGD/AdaGrad/Adam》\n",
    "\n",
    "https://zhuanlan.zhihu.com/p/32230623\n",
    "\n",
    "对于一般新手炼丹师，优化器直接使用Adam，并使用其默认参数就OK了。\n",
    "\n",
    "一些爱写论文的炼丹师由于追求评估指标效果，可能会偏爱前期使用Adam优化器快速下降，后期使用SGD并精调优化器参数得到更好的结果。\n",
    "\n",
    "此外目前也有一些前沿的优化算法，据称效果比Adam更好，例如LazyAdam, Look-ahead, RAdam, Ranger等."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 一、优化器的使用\n",
    "\n",
    "优化器主要使用apply_gradients方法传入变量和对应梯度从而来对给定变量进行迭代，或者直接使用minimize方法对目标函数进行迭代优化。\n",
    "\n",
    "当然，更常见的使用是在编译时将优化器传入keras的Model,通过调用model.fit实现对Loss的的迭代优化。\n",
    "\n",
    "初始化优化器时会创建一个变量optimier.iterations用于记录迭代的次数。因此优化器和tf.Variable一样，一般需要在@tf.function外创建。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np \n",
    "\n",
    "#打印时间分割线\n",
    "@tf.function\n",
    "def printbar():\n",
    "    ts = tf.timestamp()\n",
    "    today_ts = ts%(24*60*60)\n",
    "\n",
    "    hour = tf.cast(today_ts//3600+8,tf.int32)%tf.constant(24)\n",
    "    minite = tf.cast((today_ts%3600)//60,tf.int32)\n",
    "    second = tf.cast(tf.floor(today_ts%60),tf.int32)\n",
    "    \n",
    "    def timeformat(m):\n",
    "        if tf.strings.length(tf.strings.format(\"{}\",m))==1:\n",
    "            return(tf.strings.format(\"0{}\",m))\n",
    "        else:\n",
    "            return(tf.strings.format(\"{}\",m))\n",
    "    \n",
    "    timestring = tf.strings.join([timeformat(hour),timeformat(minite),\n",
    "                timeformat(second)],separator = \":\")\n",
    "    tf.print(\"==========\"*8,end = \"\")\n",
    "    tf.print(timestring)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "================================================================================17:43:39\n",
      "step =  100\n",
      "x =  0.867380381\n",
      "\n",
      "================================================================================17:43:39\n",
      "step =  200\n",
      "x =  0.98241204\n",
      "\n",
      "================================================================================17:43:39\n",
      "step =  300\n",
      "x =  0.997667611\n",
      "\n",
      "================================================================================17:43:39\n",
      "step =  400\n",
      "x =  0.999690711\n",
      "\n",
      "================================================================================17:43:39\n",
      "step =  500\n",
      "x =  0.999959\n",
      "\n",
      "================================================================================17:43:39\n",
      "step =  600\n",
      "x =  0.999994516\n",
      "\n",
      "y = 0\n",
      "x = 0.999995232\n"
     ]
    }
   ],
   "source": [
    "# 求f(x) = a*x**2 + b*x + c的最小值\n",
    "\n",
    "# 使用optimizer.apply_gradients\n",
    "\n",
    "x = tf.Variable(0.0,name = \"x\",dtype = tf.float32)\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)\n",
    "\n",
    "@tf.function\n",
    "def minimizef():\n",
    "    a = tf.constant(1.0)\n",
    "    b = tf.constant(-2.0)\n",
    "    c = tf.constant(1.0)\n",
    "    \n",
    "    while tf.constant(True): \n",
    "        with tf.GradientTape() as tape:\n",
    "            y = a*tf.pow(x,2) + b*x + c\n",
    "        dy_dx = tape.gradient(y,x)\n",
    "        optimizer.apply_gradients(grads_and_vars=[(dy_dx,x)])\n",
    "        \n",
    "        #迭代终止条件\n",
    "        if tf.abs(dy_dx)<tf.constant(0.00001):\n",
    "            break\n",
    "            \n",
    "        if tf.math.mod(optimizer.iterations,100)==0:\n",
    "            printbar()\n",
    "            tf.print(\"step = \",optimizer.iterations)\n",
    "            tf.print(\"x = \", x)\n",
    "            tf.print(\"\")\n",
    "                \n",
    "    y = a*tf.pow(x,2) + b*x + c\n",
    "    return y\n",
    "\n",
    "tf.print(\"y =\",minimizef())\n",
    "tf.print(\"x =\",x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch =  1000\n",
      "y =  0\n",
      "x =  0.999998569\n"
     ]
    }
   ],
   "source": [
    "# 求f(x) = a*x**2 + b*x + c的最小值\n",
    "\n",
    "# 使用optimizer.minimize\n",
    "\n",
    "x = tf.Variable(0.0, name=\"x\", dtype=tf.float32)\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)   \n",
    "\n",
    "def f():   \n",
    "    a = tf.constant(1.0)\n",
    "    b = tf.constant(-2.0)\n",
    "    c = tf.constant(1.0)\n",
    "    y = a*tf.pow(x,2) + b*x + c\n",
    "    return(y)\n",
    "\n",
    "@tf.function\n",
    "def train(epoch = 1000):  \n",
    "    for _ in tf.range(epoch):  \n",
    "        optimizer.minimize(f,[x])\n",
    "    tf.print(\"epoch = \",optimizer.iterations)\n",
    "    return(f())\n",
    "\n",
    "train(1000)\n",
    "tf.print(\"y = \", f())\n",
    "tf.print(\"x = \", x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"fake_model\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "Total params: 1\n",
      "Trainable params: 1\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "Train on 100 samples\n",
      "Epoch 1/10\n",
      "100/100 [==============================] - 0s 2ms/sample - loss: 0.2481\n",
      "Epoch 2/10\n",
      "100/100 [==============================] - 0s 680us/sample - loss: 0.0044\n",
      "Epoch 3/10\n",
      "100/100 [==============================] - 0s 590us/sample - loss: 7.6740e-05\n",
      "Epoch 4/10\n",
      "100/100 [==============================] - 0s 533us/sample - loss: 1.3500e-06\n",
      "Epoch 5/10\n",
      "100/100 [==============================] - 0s 630us/sample - loss: 1.8477e-08\n",
      "Epoch 6/10\n",
      "100/100 [==============================] - 0s 612us/sample - loss: 0.0000e+00\n",
      "Epoch 7/10\n",
      "100/100 [==============================] - 0s 639us/sample - loss: 0.0000e+00\n",
      "Epoch 8/10\n",
      "100/100 [==============================] - 0s 642us/sample - loss: 0.0000e+00\n",
      "Epoch 9/10\n",
      "100/100 [==============================] - 0s 627us/sample - loss: 0.0000e+00\n",
      "Epoch 10/10\n",
      "100/100 [==============================] - 0s 607us/sample - loss: 0.0000e+00\n"
     ]
    }
   ],
   "source": [
    "# 求f(x) = a*x**2 + b*x + c的最小值\n",
    "# 使用model.fit\n",
    "\n",
    "tf.keras.backend.clear_session()\n",
    "\n",
    "class FakeModel(tf.keras.models.Model):\n",
    "    def __init__(self,a,b,c):\n",
    "        super(FakeModel,self).__init__()\n",
    "        self.a = a\n",
    "        self.b = b\n",
    "        self.c = c\n",
    "    \n",
    "    def build(self):\n",
    "        self.x = tf.Variable(0.0,name = \"x\")\n",
    "        self.built = True\n",
    "    \n",
    "    def call(self,features):\n",
    "        loss  = self.a*(self.x)**2 + self.b*(self.x) + self.c\n",
    "        return(tf.ones_like(features)*loss)\n",
    "    \n",
    "def myloss(y_true,y_pred):\n",
    "    return tf.reduce_mean(y_pred)\n",
    "\n",
    "model = FakeModel(tf.constant(1.0), tf.constant(-2.0), tf.constant(1.0))\n",
    "\n",
    "model.build()\n",
    "model.summary()\n",
    "\n",
    "model.compile(optimizer = \n",
    "              tf.keras.optimizers.SGD(learning_rate=0.01), loss = myloss)\n",
    "history = model.fit(tf.zeros((100,2)),tf.ones(100), \n",
    "                    batch_size=1, epochs=10)  #迭代1000次"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x= 0.999998569\n",
      "loss= 0\n"
     ]
    }
   ],
   "source": [
    "tf.print(\"x=\",model.x)\n",
    "tf.print(\"loss=\",model(tf.constant(0.0)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 二、内置优化器\n",
    "\n",
    "深度学习优化算法大概经历了 SGD -> SGDM -> NAG ->Adagrad -> Adadelta(RMSprop) -> Adam -> Nadam 这样的发展历程。\n",
    "\n",
    "在keras.optimizers子模块中，它们基本上都有对应的类的实现。\n",
    "\n",
    "* SGD, 默认参数为纯SGD, 设置momentum参数不为0实际上变成SGDM, 考虑了一阶动量, 设置 nesterov为True后变成NAG，即 Nesterov Acceleration Gradient，在计算梯度时计算的是向前走一步所在位置的梯度。\n",
    "\n",
    "* Adagrad, 考虑了二阶动量，对于不同的参数有不同的学习率，即自适应学习率。缺点是学习率单调下降，可能后期学习速率过慢乃至提前停止学习。\n",
    "\n",
    "* RMSprop, 考虑了二阶动量，对于不同的参数有不同的学习率，即自适应学习率，对Adagrad进行了优化，通过指数平滑只考虑一定窗口内的二阶动量。\n",
    "\n",
    "* Adadelta, 考虑了二阶动量，与RMSprop类似，但是更加复杂一些，自适应性更强。\n",
    "\n",
    "* Adam, 同时考虑了一阶动量和二阶动量，可以看成RMSprop上进一步考虑了Momentum。\n",
    "\n",
    "* Nadam, 在Adam基础上进一步考虑了 Nesterov Acceleration。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
